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LONG-TERM STABILITY OF SEQUENTIAL MONTE 
CARLO METHODS UNDER VERIFIABLE CONDITIONS 

- - . By Randal Douc *'§ Eric Moulines "I"'^ and Jimmy Olsson ^ 

(N 

This paper discusses particle filtering in general hidden Markov 

f^ ' models (HMMs) and presents novel theoretical results on the long- 

^Nj , term stability of bootstrap-type particle filters. More specifically, we 

» I ' establish that the asymptotic variance of the Monte Carlo estimates 

C^ ' produced by the bootstrap filter is uniformly bounded in time. On 

the contrary to most previous results of this type, which in general 
presuppose that the state space of the hidden state process is com- 
(«_^ I pact (an assumption that is rarely satisfied in practice), our very mild 

C^ . assumptions are satisfied for a large class of HMMs with possibly non- 

compact state space. In addition, we derive a similar time uniform 
'r. ' , bound on the asymptotic L'' error. Importantly, our results hold for 



misspecified models, i.e. we do not at all assume that the data en- 
tering into the particle filter originate from the model governing the 
i-i^ , dynamics of the particles or not even from an HMM. 






1. Introduction. This paper deals with estimation in general hidden 

Markov models (HMMs) via sequential Monte Carlo (SMC) methods (or 

^ I particle filters). More specifically, we present novel results on the numerical 

OO ■ stability of the bootstrap particle filter that hold under very general and 

easily verifiable assumptions. Before stating the results we provide some 

\^ I background. 

^f^ ' Consider an HMM {Xn,Yn)nem, where the Markov chain (or state se- 

O ' quence) {Xn)n&ni taking values in some general state space (X, Af), is only 

partially observed through the sequence {Xn)n&n of observations taking val- 
ues in another general state space (Y,3^). More specifically, conditionally 
on the state sequence {Xn)n&i-, the observations are assumed to be con- 
ditionally independent and such that the conditional distribution of each 
j^ \ Yn depends on the corresponding state Xn only; see e.g. [2] and the refer- 

ences therein. We denote by Q and x the kernel and initial distribution of 



(N 
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2 R. DOUG, E. MOULINES, AND J. OLSSON 

{Xn)n&n, respectively. Even though n is not necessarily a temporal index, it 
will in the following be referred to as "time" . 

Any kind of statistical estimation in HMMs typically involves computa- 
tion of the conditional distribution of one or several hidden states given a set 
of observations. Of particular interest are the so-called filter distributions, 
where the filter distribution at time n is defined as the conditional distribu- 
tion of Xn given the corresponding observation history Yq" = (Yq, . . . ,Yn) 
(this will be our generic notation for vectors), and the problem of comput- 
ing, recursively in n and in a single sweep of the data, the sequence of filter 
distributions is referred to as optimal filtering. Alternatively, one may focus 
on the predictor distributions, where the predictor distribution at time n 
is defined as the conditional distribution of the state Xn given the preced- 
ing observation history Yq~ , and the predictor distributions are in general 
obtained as a by-product when computing the filter distributions and vice 
versa. In this paper we focus on computation of the predictor distributions, 
which we denote by (j)^{Y^~ ), Ji G N* (a more precise definition of these 
measures is given in Section 2). The filter recursion defines a measure- valued 
mapping $ generating recursively the predictor distribution fiow according 
to ^^(yo") = ^{Yn){4'x(^o~ )) i'^^ refer again to Section 2 for more precise 
definitions). 

Unless the HMM is either a linear Gaussian model or a model compris- 
ing only a finite number of possible states, exact numeric computation of 
the predictor distributions is in general infeasible. Thus, one is in general 
confined to using finite-dimensional approximations of these measures, and 
in this paper we concentrate on the use of particle filters for this purpose. 
A particle filter approximates the predictor distribution at time n by the 
empirical measure 4>^ {Yq~ ) associated with a finite sample (^^)^^ of par- 
ticles evolving randomly and recursively in time. Particle filters comprise 
generally two main operations: a mutation step and a selection step. The 
mutation step randomly disseminates the particles in the state space while 
the selection step duplicates or eliminates particles with high or low pos- 
terior probability, respectively. The most basic algorithm — proposed in [18] 
and referred to as the bootstrap particle filter — mutates the particles accord- 
ing to the dynamics of the latent Markov chain and selects the same with 
probabilities proportional to the local likelihood of the mutated particles. 
Thus, subjecting a particle sample (^^)^^ to selection and mutation is in 
the case of the bootstrap particle filter equivalent to drawing, conditionally 
independently given (^^)^i, new particles (^^+i)^i from the distribution 
^ (Yn) {(f)^ {Y^~ )) obtained by plugging the empirical measure (f)^ {Yq~ ) 
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LONG-TERM STABILITY OF SEQUENTIAL MONTE CARLO METHODS 3 

into the filter recursion, which we denote 

(1) (e+i)ili -i.i.d. HY^)ic^^{Y,-~')r''. 

Since the seminal paper [18], particle filters have been successfully ap- 
plied to nonlinear filtering problems in many different fields; we refer to 
the collection [15] for an introduction to particle filtering in general and for 
miscellaneous examples of real-life applications. 

The theory of particle filtering is an active field and there is a number 
of available convergence results concerning, e.g., L^ error bounds and weak 
convergence — see the monographs [5, 1] and the references therein. Most of 
these results establish the convergence, as the number of particles N tends 
to infinity, of the particle filter for a fixed time step n G N*. For infinite 
time horizons, i.e. when n tends to infinity, convergence is less obvious. 
Indeed, each recursive update (1) of the particles {^n)iLi is based on the 
implicit assumption that the empirical measure ipt: {Y^~ ) associated with 
the ancestor sample approximates perfectly well the predictor 0^(y^"~ ) at 
the previous time step; however, since the ancestor sample is marred by an 
error itself, one may expect that the errors induced at the different updating 
steps accumulate and, consequently, that the total error propagated through 
the algorithm increases with n. This would make the algorithm useless in 
practice. Fortunately, it has been observed empirically by several authors 
(see e.g. [30, Section 1.1]) that the convergence of particle filters appears to 
be uniform in time also for very general HMMs. Nevertheless, even though 
long-term stability is essential for the applicability of particle filters, most 
existing time uniform convergence results are obtained under assumptions 
that are generally not met in practice. The aim of the present paper is thus 
to establish the infinite time-horizon stability under mild and easy verifiable 
assumptions, satisfied by most models for which the particle filter has been 
found to be useful. 

1.1. Previous work. To our knowledge, the first time uniform conver- 
gence result for bootstrap-type particle filters was obtained by [7] (see also 
the book [5] for refinements) using a technique based on the uniform forget- 
ting of the predictor distribution. We recall in some detail this technique. 
By writing 

'^x W> - -^x W> = -^x W) - HYnKc^^iYr')) 

' V ' 

sanipling error 

+ «&(y„)(c/.;^(yo"-i)) - ci>(y„)(c/.^(yo"-^)) 

^^ V ' 

initialization error 
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4 R. DOUG, E. MOULINES, AND J. OLSSON 

one may decompose the error (j)^ {Y^^) — cji^i^Q) into a first error (the sam- 
pling error) introduced by replacing $(l^)(i;^^(y^"~ )) by its empirical esti- 
mate (j)^ (Yq) and a second error (the initialization error) originating from 
the discrepancy between empirical measure ct)^{YQ~ ) associated with the 
ancestor particles and the true predictor (I)^(Yq~ ). The sampling error is 
easy to control. One may for example use the Marcinkiewicz-Zygmund in- 
equality to bound the L^ error by cN~^''^, where c G M^ is a universal 
constant. Exponential deviation inequalities may also be obtained. For the 
initialization error, we may expect that the mapping ^(Yn) is in some sense 
contracting and thus downscales the discrepancy between ^^(1^"" ) and 
(/)^{Yq^ )• This is the point where the exponential forgetting of the pre- 
dictor distribution becomes crucial. Assume for instance that there exists a 
constant p S ]0, 1[ such that ||$(y;^)(^f) - $(y;^)(z^)|| < p"-"^+i||^ - u\\ for 
any integers < m < n and any probability measures p and i^, where || • || 
is some suitable norm on the space of probability measures and ^{Y^^) = 
^{Yn) o ^{Yn-i) o • • • o ^{Ym). Since $(y^)(^f) is the predictor distribution 
(f)^{Yj^) obtained when the hidden chain is initialized with the distribution 
/i at time m, this means that the predictor distribution forgets the initial 
distribution geometrically fast. In addition, the forgetting rate p is uniform 
with respect to the observations. The uniformity with respect to the obser- 
vations is of course the main reason why the assumptions on the model are 
so stringent. 

Now, decomposing similarly also the initialization error and proceeding 
recursively yields the telescoping sum 

(2) 0^(yo") - <^x W) = -^x W) - HYr,){<P^{Y,^-')) 

n-1 

+ Y. {MY,\,){4>^{Y,')) - MY,\,) o ^Y,){c^^{Yt'))] 
fc=i 

+ ^Y,-){4>^{Y,))-'^{Y{^){<t>^{Y,)). 

Now each term of the sum above can be viewed as a downscaling (by a factor 
pU-k^ of the sampling error between (J)^(Yq) and <^ {Y^) {(j)^ {Y^ ~^)) through 
the contraction of $(1^^", i)- Denoting by 5„ the V error of 0^(yQ*) and 
assuming that the initial sample is obtained through standard importance 
sampling, implying that 5q < cN~^''^, provides sketchy, using the contrac- 
tion of ^{Y^_^-^), the uniform L^ error bound (5„ < cN"'^/^ Y,k=o P"'"'' - 
ciV-V2(i_p)-i. 

Even though this result is often used a general guideline on particle filter 
stability, it relies nevertheless heavily on the assumption that the kernel Q of 

imsart-aap ver. 2011/11/15 file: dmo2011.tex date: March 1, 2013 



LONG-TERM STABILITY OF SEQUENTIAL MONTE CARLO METHODS 5 

hidden Markov chain satisfies the following strong mixing condition, which 
is even more stringent that the already very strong one-step global Doehlin 
condition: There exist constants e^ > e~ > and a probability measure v 
on (X, X) such that for all x S X and k ^ X, 

(3) e"i/(A) < Q(x,A) <e+i/(A). 

This assumption, which in particular implies that the Markov chain is uni- 
formly geometrically ergodic, restricts the applicability of the stability result 
in question to models where the state space X is small (for Markov chains 
on separable metric spaces, provided that the kernel is strongly Feller, the 
condition (3) typically requires the state space to be compact). Some refine- 
ments have been obtained in e.g. [23, 22, 5, 25, 29, 2, 24, 14, 4, 19]. 

The long-term stability of particle filters is also related to the bounded- 
ness of the asymptotic variance. The first central limit theorem (CLT) for 
bootstrap-type particle filters was derived by [6]. More specifically, it was 
shown that the normalized Monte Carlo error viV(^y^(y^'^~ )h—(j)^{YQ~ )h) 
tends weakly, for a fixed n G N* and as the particle population size N 
tends to infinity, to a zero mean normal-distributed variable with vari- 
ance a'^{YQ~ ){h). Here we have used the notation fih = ^ h{x) ji{dx) to 
denote expectations. The original proof of the CLT was later simplified 
and extended to more general particle filtering algorithms in [21, 3, 12, 
14, 16]; in Section 2 we recall in detail the version obtained in [12] and 
provide an explicit expression of the asymptotic variance c7^(y^"~ )(^)- As 
shown first by [7, Theorem 3.1], it is possible, using the strong mixing as- 
sumption described above, to bound uniformly also the asymptotic vari- 
ance (T^(l^"~ )(/i) by similar forgetting-based arguments. Here a key ingre- 
dient is that the particles (Cn)i^i obtained at the different time steps be- 
come, asymptotically as A^ tends to infinity, statistically independent. Con- 
sequently, the total asymptotic variance of \fN {(j)^ (YQ~^)h — (j)^(YQ~^)h) 
is obtained by simply summing up the the asymptotic variances of the er- 
ror terms y/N{^{Y^^^)[(l)^{Y^))h - ^{Y^^^) o ^ {Yk) {(j)^ {Y^-^))h) in the 
decomposition (2). Finally, applying again the contraction of the composed 
mapping $(1^) yields a uniform bound on the total asymptotic variance 
in accordance with the calculation above. In [10], a similar stability result 
was obtained for a particle-based version of the forward-filtering backward- 
simulation algorithm (proposed in [17]); nevertheless, also the analysis of 
this work relies completely on the assumption of strong mixing of the latent 
Markov chain, which, as already pointed out, does not hold for most models 
used in practice. 

A first breakthrough towards stability results for non-compact state spaces 
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6 R. DOUG, E. MOULINES, AND J. OLSSON 

was made in [30]. This work establishes, again for bootstrap-type particle 
filters, a uniform time average convergence result of form 

(4) hm supE ( n-i f^ W^x i^o) " 0x(^o')IIbl | = 0, 

where || • ||bl denotes the dual bounded-Lipschitz norm and (J)^{Yq) denotes 
the filter distribution at time k. This result, obtained as a special case of 
a general approximation theorem derived in the same paper, was estab- 
lished under very weak assumptions on the local likelihood (supposed to be 
bounded and continuous) and the Markov kernel (supposed to be Feller). 
These assumptions are, together with the basic assumption that the hid- 
den Markov chain is positive Harris and aperiodic, satisfied for a large class 
of HMMs with possibly non-compact state spaces. Nevertheless, the proof 
is heavily based on the assumption that the particles evolve according to 
exactly the same model dynamics as the observations entered into the algo- 
rithm, in other words, that the model is perfectly specified. This of course 
never true in practice. In addition, the convergence result (4) does not, on 
the contrary to L^ bounds and CLTs, provide a rate of convergence of the 
algorithm. 

1.2. Approach of this paper. In this paper we return to more standard 
convergence modes and reconsider the asymptotic variance and L^ error 
of bootstrap particle filters. As noticed by [16], restricting the analysis to 
bootstrap-type particle filters does not imply a significant loss of generality, 
as the CLT for more general auxiliary particle filters [26] can be straight- 
forwardly obtained by applying the bootstrap filter CLT to a somewhat 
modified HMM incorporating the so-called adjustment multiplier weights of 
the auxiliary particle filter into the model dynamics. Our aim is to estab- 
lish that the asymptotic variance and L^ error are stochastically bounded in 
the non-compact case. Recall that a sequence {^n)n£N of probability mea- 
sures on (M, ;B(M)) is tight if for all e > there exists a compact interval 
I = [—a, a] C M such that /U„(l'^) < e for all n. In addition, we call a sequence 
{Zn)neN of random variables, with Z„ ~ /i„, tight if the sequence (/i„)„gN 
of marginal distributions is tight. In this paper, we show that the sequence 
(cr^(yQ"~ )(/i))neN* of asymptotic variances is tight for any stationary se- 
quence {Yn)n£N of observations. In particular, we do not at all assume that 
the observations originate from the model governing the dynamics of the 
particle filter or not even from an HMM. 

Our proofs are based on novel coupling techniques developed in [13] (and 
going back to [20] and [9] ) with the purpose of establishing the convergence 
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LONG-TERM STABILITY OF SEQUENTIAL MONTE CARLO METHODS 7 

of the relative entropy for misspecified HMMs. In our analysis, the strong 
mixing assumption (3) is replaced by the considerably weaker r-local Doehlin 
condition (14). This assumption is, for instance, trivially satisfied (for r = 1) 
if there exist a measurable set C C X, a probability measure Ac on (X, X) 
such that Ac(C) = 1, and positive constants < e^ < ej such that for all 
x G X and ah A G A', 

(5) ec^c(A)<Q(a;,AnC)<e+Ac(A), 

a condition that is easily verified for many HMMs with non-compact state 
space (we emphasize however that the assumption (14) is even weaker than 

(5)). 

To sum up, the contribution of the present paper is twofold, since 

• we present time uniform bounds that also provide the rate of conver- 
gence in A^ of the particle filter for very general HMMs (with possibly 
non-compact state space). 

• we establish long-term stability of the particle filter also in the case 
of misspecification, i.e. when the stationary law of the observations 
entering the particle filter differs from that of the HMM governing the 
dynamics of the particles (Cn)i^i- 

1.3. Outline of the paper. The paper is organized as follows. Section 2 
provides the main notation and definitions. It also introduces the concepts of 
HMMs and bootstrap particle filters. In Section 3 our main results are stated 
together with the main layouts of the proofs. Section 4 treats some examples 
and Section 5 and Appendix A provide the full details of our proofs. 

2. Preliminaries. 

2.1. Notation. We preface the introduction of HMMs by some notation. 
Let (X, X) be a measurable space, where ^ is a countably generated a- 
field. Denote by J^(X) (resp. J^+(X)) the set of bounded (resp. bounded 
and positive) A" /i3(M) -measurable functions on X and by ^(X, A') the set of 
probability measures on (X, Af). Let K : X x A' — > M+ be a finite kernel on 
X, i.e. for each x E X, the mapping K(x, •) : A i— )■ K(x, A) is a finite measure 
on X and for each A E ^, the function K(x, •) : x i— )■ K(x, A) is X /B{[Q, 1])- 
measurable. If K(x, •) is a probability measure on (X, A') for all x E X, then 
the kernel K is said to be Markov. A kernel induces two integral operators, 
the first acting on the space A4(X, X) of u-finite measures on (X, X) and the 
other on T(X). More specifically, for fi E Ai{X,X) and / E ^(X) we define 
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8 R. DOUG, E. MOULINES, AND J. OLSSON 

the measure 



and the function 



IiK-.XbA^ I K(x, A) /i(dx) 
K/ : X 9 X ^ / f{x') K(x, dx'). 



Moreover, the composition (or product) of two kernels K and M on X is 
defined as 

KM : X X A' 9 (x, A) h^ / M(x', A) K(x, dx'). 

2.2. Hidden Markov models. Let (X, Af) and (Y,3^) be two measurable 
spaces. We specify the HMM as follows. Let Q : X x ^ — ;• [0, 1] and G : 
Xx3^ — ;• [0, 1] be given Markov kernels and let x be a given initial distribution 
on (X, Af). In this setting, define the Markov kernel 

T((x, y), A) ^ jj 1a(x', y') Q(x, dx') G(x', Ay'), 

(x,y)eXxY, ^(^X®y, 

on the product space (X x Y, Af ® y). Let (X„,y„)„gN be the canonical 
Markov chain induced by T and the initial distribution X ® y 3 A i— > 
/ 1pk{x, y) x(dx) G(x, dy). The bivariate process (X„,, y„,)„gN is what we re- 
fer to as the HMM. We shall denote by P,^ and E^^, the probability measure 
and corresponding expectation associated with the HMM on the canonical 
space ((X X Y)^, {X ® 3^)'^'^). We assume that the observation kernel G is 
non-degenerated in the sense that there exists a cr- finite measure i/ on (Y, y) 
and a measurable function 51 : X x Y — )• ]0, oo[ such that 

Gix,A)= fl^iy)gix,y)i^{dy), x G X, A G 3^. 

When operating on HMMs we are in general interested in computing ex- 
pectations of type E^(/i(X^)|yj") for integers {k,£,m) e N^ with k < i 
and functions h S T{X^~''~^^). Of particular interest are quantities of form 
E^{h{Xn)\Yf^~^) or E^{h{Xn)\Y^) and the term optimal filtering refers to 
problem of computing, recursively in n, such conditional distributions and 
expectations as new data becomes available. As mentioned in the introduc- 
tion, we will focus on online computation of expectations of the former type. 
For any record y^ G ym-k+i of observations, let Ij{y^) be the unnormalized 
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LONG-TERM STABILITY OF SEQUENTIAL MONTE CARLO METHODS 9 
kernel on (X, Af ) defined by 

/r. m 

■■■ lA{xm+i)Y[g{xi,yt)Q{xi,dxi+i), 

Xfc G X, A G Af, 
with the convention 

(7) L(C>(x,A)^<5,.(A)forA;>m 

(where 6^ denotes the Dirac mass at point x). Note that the function j/q" i— )■ 
xL{yQ~ )lx is exactly the density of the observations Yq~ (i.e. the likeli- 
hood function) with respect to ly®^. Also note that for any £ £ {k, . . . , m— 1}, 

(8) L(C>=L(yf.)L(yr+i). 
Let (pxiVk^) be the probability measure defined by 

Note that this implies that (pxiuT) ~ X when k > m. Using the notation, it 
can be shown (see e.g. [2, Proposition 3.1.4]) that for any h G ^(X), 

E^ (hiXr,) \Y,-"') = j h{x)ct>x{Y^"^){dx), 

i.e. <Px{Yq~ ) is the predictor of Xn given the observations Yq~ . From the 
definition (9) one immediately obtains the recursion 



T^X \"0 / V / J. /, n— 1\T /„\-ti r„/'™„.\J, /„."■— 1\ / J„\ ' ' 



(l)x{yr^)Myn)lA ^ Jgix,yn)Qix,A)^xiyr^)i ^x) 
(/>^(y^l)L(y„)lx Jgix,yr,)cP^{y^-')idx) 

which can be expressed in condensed form as 

(10) <Px{y^) = Hyn){<PAyo-'))^ 

where ^{yn) transforms a probability measure ^ G 'P(X, X) into the measure 

! g{x,yn)Q,{x,/^) fi{dx) 



^{yn){lj)-X3k^ 



fg{x,yn)Kdx) 



As mentioned in the introduction, the recursion (10) cannot in general be 
solved in closed form. In the following section we discuss how approximate 
solutions to (10) can be obtained using particle filters, with focus set on the 
bootstrap particle filter proposed in [18]. 
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10 R. DOUG, E. MOULINES, AND J. OLSSON 

2.3. The bootstrap particle filter. In the following we assume that all 
random variables are defined on a common probability space {0,,A,¥). The 
bootstrap particle filter updates sequentially a set of weighted simulations 
in order to approximate online the flow the predictor distributions. In or- 
der to describe precisely how this is done for a given sequence (yn)neN of 
observations we proceed inductively and assume that we are given a sample 
of X-valued random draws (^^)^]^ (the particles) such that the empirical 



measure 



1 ^ 



associated with these draws targets the predictor 4'xiyo ) ^^ ^^^ sense that 



^{y'^~^)h = J2Zi KCi)/N estimates (p^{y'^-^)h for any h G J^(X). In order 



to form a new particle sample (.^^^^)^]^ approximating the predictor (pxiUo) 
at the subsequent time step, we replace, in (10), the true predictor i;^^(2/q~ ) 
by the particle estimate (p^ {yQ~ ). This yields the approximation 

(11) ^x{yo)iN^T. ^N f , Q(a,A), AGAf. 

Next, the sample (■^^+1)^=1 is generated by simulating A^ conditionally in- 
dependent draws from the mixture in (11) using the following algorithm. 



set n^ ^ 






for i = 1 — ^ 


iVdo 




set cjjj •(— 


9{Cn,yn) 




set n^ ^ 


n^ + ui. 




end for 






for i = 1 — ;> 


N do 




draw II ~ (a;^i/J7^) 


N 


draw Cn+i 


~Q(Cn", 


•) 


end for 







In the scheme above, the operation ~ means implicitly that all draws (for 
different i's) are conditionally independent. Moreover, the operation I^ ~ 
{(^n/^n)eLi means that each index /^ is simulated according to the discrete 
probability distribution generated by the normalized importance weights 
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LONG-TERM STABILITY OF SEQUENTIAL MONTE CARLO METHODS 11 

{(^n/^n)e=i- T^^^ algorithm is typically initialized by drawing N i.i.d. par- 
ticles ('^o)i^i fron^ the initial distribution x find letting "^i^i^ci /N be an 
estimate of x- 

As mentioned in the introduction, the asymptotic properties, as the num- 
ber A'^ of particles tends to infinity, of the bootstrap particle filter output 
are well investigated. When it concerns weak convergence, [6] established 
the following CLT. Define for h G T{X), 

Theorem 1 ([6]). For all h G J"(X) and y^'^ G Y" it holds, as N ^ oo, 

(13) ^{cP^{y^-')h - Myr')h) A aJy^~'){h)Z, 

where a^{yQ~ )(/i) is defined in (12) and Z is a standard normal- distributed 
random variable. 

When the observations (Yn)n£'N entering the particle filter are random, the 
sequence {a^{Yf^~ )(/j))nGN* of asymptotic variances is an (J^^)„gpj-adapted 
stochastic process, where {J^^)nen is the natural filtration of the observation 
process. The aim of the next section is to establish that this sequence is 
tight. Importantly, we assume in the following that the observations {Yn)n£N 
entering the particle filter algorithm is an arbitrary ^-stationary sequence 
taking values in Y. The stationary process {Yn)n^^ can be embedded into 
a stationary process {Yn)nez with doubly infinite time. In particular, we do 
not at all assume that the observations originate from the model governing 
the dynamics of the particles; indeed, in the framework we consider, we do 
not even assume that the observations originate from an HMM. 

3. Main results and assumptions. Before listing our main assump- 
tions, we recall the definition of a r-local Doeblin set. 

Definition 2. Let r £ W . A set C £ X is r-local Doeblin with respect 
to {Q, g} if there exist positive functions e^ : Y*" — )■ M~^ and e^ : Y'' — )• R^, a 
family {fic{z);z G Y^} of probability measures, and a family {ipc{z);z G Y*"} 
of positive functions such that for all z G Y*", fic{z){C) = 1 and for all A £ X 
and X G C, 

(14) e^{z) ^c{z){x) ^c(^)(A) < L(z)(x, AnC) < e+(z) vc{z){x) /ic(^)(A). 
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12 R. DOUG, E. MOULINES, AND J. OLSSON 

(Al) The process {Yn)n€Z is strictly stationary. Moreover, there exist an integer 
r G N* and a set K G 3^®' such that the following holds. 

(i) The process (Z„)„gz, where Zn — Ynr , is ergodic and such 

that P [Zo G K) > 2/3. 

(ii) For all r/ > there exists an r-local Doeblin set C G A' such that for 
all y^-' G K, 

(15) sup L(y^-i)(x,X) < r]snpL{y^-^){x,X) < oo 



and 



,~,..r~l, 



(16) inf -^fU->^^ 

where the functions ej and e^ are given in Definition 2. 



(iii) There exists a set D G A' such that 



(17) E In- inf 5^L(yj-^)lD < oo. 

(A2) (i) ^(x, y) > for all (x, ?/) G X x Y . 
(ii) E (In+sup^gx 5(2^,^0)) < 00. 

Remark 3. In the case r = 1 we may replace (Al) by the simpler 
assumption that there exists a set K G 3^ such that the following holds. 

(i) P (Fo G K) > 2/3. 
(ii) For all rj > there exists a local Doeblin set C £ X such that for all 

(18) sup g{x,y) < 7] \\g{-,y)\\^< 00. 

(iii) There exists a set D £ X satisfying 

inf Q{x, D) > and E ( In" inf g{x,Yo) ] < 00. 

xSD y x£D J 

For the integer r G N* and the set D G Af given in (Al), define A^(D, r) C 
V{X,X)hY 
(19) 

M{D,r) ^ |xGP(X,A') :E('ln-xL(>"o"^>lD) < 00 for alH G {0,...,r}}. 
A simple sufficient condition can be proposed to ensure that x £ M{D,r). 
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Proposition 4. Assume that there exists a sequence of sets D^ G X , 
u € {0, ... , r — 1}, such that (setting 0^ = for notational convenience) for 
some 5 > 0, 

(20) inf Q(x,D„)>5, uG{l,...,r}, 
a;eDu_i 

and 

(21) E (in- inf g(x,yo) ) <oo, uG{0,...,r}. 

\ xGD„ J 

Then any initial distribution x G ViY^^X) satisfying x(Do) > belongs to 
M{D,r). 

Remark 5. To check (21) we typically assume that for any given y £Y , 
the function x i— ?• g{x,y) is continuous and that Di, i £ {0, . . . ,r — 1}, are 
compact sets. This condition then translates into an assumption on some 
generalized moments of the process {Yn)neZ- 

Remark 6. Assume that X = M for some d G N* (or more generally, X 
is a locally compact separable metric space) and that X is the associated Borel 
a-field. Assume in addition that for any open subset £ X , the function 
X —7- Q{x, 0) is lower semi- continuous on the space X. Then for any 5 > 
and any compact subset Dq G X , there exists a sequence of compact subsets 
Du, n G {0, . . . , r — 1}, satisfying (20). 

We are now ready to state our main result. 

Theorem 7. Assume (Al-2) . Then for all x G M(D,r) and all h G 
J^(X), the sequence {a'^{YQ~ ){h))neN* (defined in (12)) is tight. 

Proof of Theorem 7. Using the definition (9) of the predictive dis- 
tribution and the decomposition (8) of the likelihood, we get for all k G 
{0,...,n-l}, 



MYo~lh 



xL(yo"-')ix xMYt')MYr'nx 



Plugging this identity into the expression (12) of the asymptotic variance 
yields 



n „ 



A 



,{Yt')MYr'nx? 
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14 R. DOUG, E. MOULINES, AND J. OLSSON 

where for all sequences y^~ G yn-k^ functions / and h in -F(X), and prob- 
ability measures x ^"^^ x' iii 'P(X, Af), 

(22) A^,^,(yri)(/,/i) ^ xMyV)f X x'L(yri)/i 
Using (9), we obtain for all sequences ^q" G Y", 

</'x(yo )L(yfc )ix= fc„i 



xMvt'n^ 



nXL(yO MX T-r , f_lw X 



where t^x^Vo )(y^) i^ ^'^^ density of the conditional distribution of Yi given 
Yq ~ (i.e. the one-step observation predictor at time t) defined by 

(23) 7r^(y^-i)(y,)A U^{y'^-^)[dx)g{x,y,). 



With this notation, the likelihood function x^iVo )lx equals the product 
TlkZo''^x(yo~ )iyk) (where we let T^xiVo )(yo) denote the marginal density 
oflo)- 

Now, using coupling results obtained in [13] one may prove that the pre- 
dictor distribution forgets its initial distribution exponentially fast under the 
r-local Doeblin assumption (14). Moreover, this implies that also the log- 
density of the one-step observation predictor forgets its initial distribution 
exponentially fast, i.e. for all initial distributions x ^-nd x' there is a deter- 
ministic constant (3 G ]0, 1[ and an almost surely bounded random variable 
Cx,x' such that for all (fc, m) G N* x N and almost all observation sequences. 



(24) lnvr^(y_t')(n) - ln7r^,(y_^-i)(yfc) 



< C^,y/3'=+'". 



Using this, it is shown in [13, Proposition 1] that 

(i) there exists a function vr : Y^ x Y — >■ M such that for all probability 
measures x ^ -^(D, r), 

hm TT.^{Yli){Yo) = 7r{Y-^)iYo), P-a.s. 

Moreover, 

(25) E(|lnvr(y_-i)(yo)|)<oo. 
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(ii) for all probability measures x £ -^(D, r), the normalized log-likelihood 
function converges according to 

(26) lim n-^lnxL(y""i)lx = ^oo, F-a.s., 

where l^o is the negated relative entropy, i.e. the expectation of In 7r(yroo) (^o) 
under the stationary distribution, i.e. 

(27) £oo=]E(lnvr(yri)(yo)). 



As a first step, we bound the asymptotic variance ar^{h){YQ ) (defined in 
(12)) by the product of two quantities, namely cr^(y^"~ )(/i) < AxB.^, where 



4 



,28) A-{ sup n ^"^T?"" I ■ 

The quantity (28) can be bounded using the exponential forgetting (24) of 
the one-step predictor log-density. More precisely, note that 

xL(y„„)lx 

thus, by applying Proposition 11 (ii) we conclude that there exist (3 G ]0, 1[ 
and a P-a.s. finite random variable C^ such that for all n € N, 

(30) n^^|g§^^nn-'^->«> 



n oo 

< n n ^MCxP'^n < exp(C^/(l - (3f) < oo, P-a.s., 



e=k m=0 

implying that A is indeed P-a.s. finite. 

Consider now the second quantity (29). Since the process {Yn)nei is 
strictly stationary, Yq~ has the same distribution as Y^n for all n G N*. 
Therefore, for all ?i E N*, the random variable B^ has the same distribution 
as 

~ ^ v^ /sup^GxIA^^^^ /y-m- i)(ir^)(/i,lx) 
(31) Bn=^ ' "" 



^=0V [UT=MYii-'){Y^iW 



imsart-aap ver. 2011/11/15 file: dmo2011.tex date: March 1, 2013 



16 R. DOUG, E. MOULINES, AND J. OLSSON 

We will show that sup^gj^* Bn is P-a.s. finite, which implies that the sequence 
{Bn)n£'N* is tight. We split each term of Bn into two factors according to 

sup.gx \A,^^^^^y-^-i^{Yl^){h, lx)| 

( ||L(y_-i)lx||oc \ ' ^^P-ex |A^^^^^^y_-™-i^(y_-J,)(/i, lx)| 



WT=i^{y-^'){Y-,) WMyiDM 



2 

oo 



and consider each factor separately. 

We will show that the first factor in (32) grows at most subgeometrically 
fast. Indeed, note that 

( ||L(y_-^)ix||oo V , , 



where 



WT=i^{y-^'){y- 



9 / m 

A ^ 



in||L(yr^)ix|L- j;in^(yriri)(y_,) . 



\ £=1 / 

According to Lemma 12, e^ — ^ 2(^oo — ^oo) = 0, P-a.s., as m — )■ oo. 

The second factor in (32) is handled using Proposition ll(iii), which guar- 
antees the existence of a constant /3 G ]0, 1[ and a P-a.s. random variable C 
such that for all (m, n) G (N*)^, 

(33) tlAJ .2 <C^/3'"l|/^lloo- 

This concludes the proof. D 

Having established tightness of the asymptotic variance, the asymptotic 
|_p error given in Theorem 8 below is obtained by establishing, for fixed 
time indices n, using a standard exponential deviation inequality, uniform 
integrability (with respect to the particle sample size A^) of the sequence of 
normalized L^ errors. After this, weak convergence implies convergence of 
moments, implying in turn convergence of the L*' error. 

Theorem 8. Assume that the sequence {a'^{YQ~^){h))n£f^* (defined in 
(12)) is tight for all functions h € -^(X). Then, for all functions h € -7^(X), 
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LONG-TERM STABILITY OF SEQUENTIAL MONTE CARLO METHODS 17 
constants p G M^, and initial distributions x ^ A^(D,r) it holds, P-a.s., 



lim ^/]VEi/P ( \(t)^{Y^~^)h - (t)^{Y^^^)h\'' 



\rn~l 
-^0 



V2.,(r»-.„.) ipi±ff^ 



N^oc 



where T is the gamma function. 

Proof. Recall that if (AAr)7veN is a sequence of random variables such 
that An — > A as N ^ oo and {A^)^^^^ is uniformly integrable for some p > 
0, then E(|^|P) < oo, limjv^ooEC^^) = E{Ap), and limN^oo^{\AN\P) = 
E{\A\P); see e.g. [27, Theorem A, p. 14]. Now set, for n £ N*, 

AN,AYo^-'){h) ^ Vn {^^{Y,-~')h - <P^{Yr')h) . 

For all g > p it holds that 

supE(\AN,^{Y^^-'){h)\' 
iVeN* ^ 

/■oo 



^0 



sup 

jvgn* Jo 

fOO 



/>oo 

g sup / e'^-ip ( |A^,^(yo"~')(/i)| > e| V') de. 
A^eN* Jo 



Now, note that (A2)(ii) implies that ||5'(-, i^n)||oo is P-a.s. finite for all n G N. 
Thus, the assumptions of [11, Lemma 2.1] are fulfilled (see also [8, Theorem 
3.39]), which implies that there exist, for all n € N, positive constants Bn 
and Cn such that for aU iV G N, aU /i G -F(X), and aU e > 0, 

(34) ¥{\AM,^{Y,^-')ih)\ > e\ Y^^-') < B„ exp(-C7,e2). 

This implies that for all n G N, P-a.s., 

sup E (|Ajv,x(>^o""')(/i)r V"') < qBn /" e"?"! exp(-C7„e2) de < oo, 

N&N* ^ ''Jo 

which establishes, via [28, Lemma II.6.3, p. 190], that {\A]y,x{Yo~^){h)\P)Nen 



n-l 



is uniformly integrable conditionally on Yq , i.e. 



y^"-! ) = 0, P-a.s. 



M-^oo TVeN* 

We may now complete the proof by applying Theorem 1, which states that 
conditionally on Yq~ , as A^ — )• oo, 

AN,AY,--'m ^ ajY,^^'){h)Z, 
where Z is a standard normal-distributed random variable. D 
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18 R. DOUG, E. MOULINES, AND J. OLSSON 

4. Applications. In this section, we develop two classes of examples. In 
section 4.1 we consider the linear Gaussian state-space models, an important 
model class that is used routinely in time-series analysis. Recall that in the 
linear Gaussian case, closed-form solutions to the optimal filtering problem 
can be obtained using the Kalman recursions. However, as an illustration, we 
analyze this model class under assumptions that are very general. In section 
4.2, we consider a significantly more general class of nonlinear state-space 
models. In both these examples we will find that Assumptions (Al— 2) are 
satisfied and straightforwardly verified. 

4.1. Linear Gaussian state-space models. The linear Gaussian state-space 
models form an important class of HMMs. Let X = M "" and Y = M^ and 
define state and observation sequences through the linear dynamic system 

Xk+i = AXk + RUk, 
Yk = BXk + SVk, 

where {Uk,Vk)k>o is an i.i.d. sequence of Gaussian vectors with zero mean 
and identity covariance matrix. The noise vectors are assumed to be inde- 
pendent of Xq. Here Uk is d^-dimensional, Vk is d^-dimensional, and the 
matrices A, R, B, and S have the appropriate dimensions. 

For any n G N, define the observability and controlability matrices On 
and Cn by 



(35) On 



B 

BA 
BA^ 



BA 



n-l 



and Cn = [A"-ii? A'^-'^R . . . R] , 



respectively. We assume the following. 

(LGSSl) The pair [A,B) is observable and the pair {A,R) is controllable, i.e. 
there exists r G N such that the observability matrix Or and the 
controllability matrix Cr have full rank. 

(LGSS2) The measurement noise covariance matrix S has full rank. 

(LGSS3) E(||yof) <oo. 

We now check Assumptions (Al— 2). The dimension du of the state noise 
vector Uk is in many situations smaller than the dimension dx of the state 
vector Xk and hence R^R may be rank deficient (here * denotes the trans- 
pose). Some additional notation is required: For any positive matrix A and 
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vector z of appropriate dimension, denote ||-2||^ — ^zA ^z. In addition, de- 
fine for any n G N, 



(36) 
where 



TT A 7") tT") _i_ C i C 
■J n — '-^n ^n \ '-'n '-'n; 





BR 

BAR 



BR 



BA'^-^R BA'^-^R 









BR 



•Sn 



s 













s 















s 



Under (LGSS2) , the matrix J>j is positive definite for any n > r. When 
the state process is initiahzed at xq E X, the hkehhood of the observations 
2/0 ~^ £ Y" is given by 

<5,„L(y^i)lx = (27r)-"'^''det-^/2(J-„)exp T-i ||y„_i - O„xo||^„ 

where y„_i = *[*?/o,*?/i, • • • ,*?/n-i] and On is defined in (35). 

We first consider (Al). Under (LGSSl), the observabihty matrix Or is 
fuh rank, and we have for any compact subset K C V, 

hm inf ||yr_i — Ora^ollj- =00, 

||a;o|Kooj,J-igK 

showing that for all r/ > 0, we may choose a compact set C C M "= such that 
(18) is satisfied. It remains to prove that any compact set C is an r-local 
Doeblin set satisfying the condition (16). For any ^q"^ € Y^ and xq G X, the 
measure 5xQL{yQ~ ) is absolutely continuous with respect to the Lebesgue 
measure on (X, X) with Radon-Nikodym derivative f-{yQ~ ) (xq, x^) given (up 



to an irrelevant multiplicative factor) by 
(37) l{yl~^){xQ,Xr) oc det-^/2(g^)g^p 
where the covariance matrix Qr is 



\yr-i 


- 


'0; 

A^ 


Xo 



Vr 



L^y \^y 



[*5/0] 
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The proof of (37) relies on the positivity of Qj., which requires further dis- 
cussion. By construction, the matrix Qr is non-negative. For all yr-i G Y'' 
and X G X, the equation 



i'yr^l'xjGr 



Yr-l 
X 



'VrYr-l +'Crxf + W'SrYr-ll 







implies that ||*P,.yr-i +*Cr2;|p = and ||*5,.yr-i|P = 0. Since the matrix 
Sr has full rank, this implies that y^-i = 0. Since also Cr has full rank (the 
pair (A, R) is commandable) , this implies in turn that x = 0. Therefore, the 
matrix Qr is positive definite and the function 



{xo,Xr) 1-^ 



continuous for all yr—i- It is therefore bounded on any compact subset of 
X^. This implies that every non-empty compact set C C M'^"' is an r-local 
Doeblin set, with Ac(-) = X^""^ {■) / X^'''^ (C) and 



[yr-i 


- 


'Or' 

A"- 


Xo 



e^(yS-') = (a^*(C)) \ mf i{yl-')ixo,Xr), 

4(2/5"') =(a'^*(C))"' sup i{f,-')ixo,Xr). 
\ / z-^. ^ \tzr2 



{a;o,a::r)eC2 

Consequently, condition (16) is satisfied for any compact set K C V""^. It 
remains to verify (Al) (iii). Under (LGSSl) , the measure 6xgl'{yQ~ ) is 
absolutely continuous with respect to the Lebesgue measure A , therefore, 
for any set D C M*^^ , 



■mi 6,Myo~)i^)>, inf i{y'o-'){xo,Xr)X^'^{D). 
Take D to be any compact set with positive Lebesgue measure. Now, 

2 



sup 



Yr-l 



Or 

A"- 



Xq 



< 2A,nax (^r) |||yr-lf + max ||xf [1 + A„,ax {'OrOr + *^M^)] | , 

where Amax(^) is the largest eigenvalue of A. Under (LGSS3), E (||lo|P) < 
oo, implying that (Al)(iii) is satisfied for any compact set. 



imsart-aap ver. 2011/11/15 file: dmo2011.tex date: March 1, 2013 



LONG-TERM STABILITY OF SEQUENTIAL MONTE CARLO METHODS 21 

We now consider (A2). Under (LGSS2), S has full rank, and taking the 
reference measure A'^'^^ as the Lebesgue measure on Y, g{x, y) is, for each 
x G X, a Gaussian density with covariance matrix S^S. We therefore have 

sup5(x,y) = (27r)-'^«/2det-^/2(5t5) < ^ 

for all y G Y, which verifies (A2)(i-ii). 

To conclude this discussion, we need to specify more explicitly the set 
Ai{D,r) (see (19)) of possible initial distributions. Using Proposition 4, 
we verify the sufficient conditions (20) and (21). To check (20), we use 
Remark 6: For any open subset C M"'^ and x G X, Q(x, 0) = E (lo(^x + RU)), 
where the expectation is taken with respect to the (iu-dimensional standard 
normal distribution. Let {xn)neN* be a sequence in X converging to x. By us- 
ing that function Iq is lower semi-continuous we obtain, via Fatou's Lemma, 

liminfQ(a;„,0) >E (limmi lo{Axn + RU)) > Q(x,0), 

showing that the function x i— )■ Q(x, 0) is lower semi-continuous for any 
open subset 0. 

Assumption (LGSS2) implies that for all (x, y) G X x Y, 

In g{x,y) > -^ln(27r) - ^Indet-^HS'S) 

-[Xmin{S'S)]-'{\\yf + \\Bxf), 

where Amin(5'*5) is the minimal eigenvalue of S^S. Therefore (21) is satisfied 
under (LGSS3). Consequently, we may apply Theorem 7 to establish tight- 
ness of the asymptotic variance for any initial distribution x S V(X, X) as 
soon as the process (yfc)fcgz is strictly stationary ergodic and E (||5^o|P) < co. 

4.2. Nonlinear state-space models. We now turn to a very general class 
of nonlinear state-space models. Let X = R , Y = M , and X and y be the 
associated Borel cr-fields. In the following we assume that for each x G X, 
the probability measure Q(x,-) has a density q{x,-) with respect to the 
Lebesgue measure A^'^'^ on M'^. For instance, the state sequence {Xk)k£n 
could be defined through some nonlinear recursion 

(38) Xfc = r(Xfc_i) + S(Xfc_i)a, 

where (Cfc)fcGN* is an i.i.d. sequence of d-dimensional random vectors with 
density p^ with respect to the Lebesgue measure A on M . Here T : M — t- 
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M and S : M — t- R are given (measurable) matrix- valued functions 
such that Ti{x) is full rank for each x £ X. Such models (38) are sometimes 
referred to as vector autoregressive conditional heteroscedasticity (ARCH) 
models and cover many models of interest in time series analysis and financial 
econometrics. In this context, we let the observations (yfc)A;GN be generated 
through a given measurement density g{x, y) (again with respect to the 
Lebesgue measure). 

We now introduce the basic assumptions of this section. 

(NLl) The function (x, x') i— t- q{x, x') is a positive continuous function on X^. 
In addition, s\yp(x,x')£'>0- (l{x-,x') < oo. 

(NL2) For any compact subset K C Y, 

g{x,y) 
hm sup — r- = 0. 

||x|KoOygK SUp^,(,x9(x',y) 

(NL3) For all {x, y) G X x Y, g{x, y) > and 



E I In sup5f(x,yo) < oo. 
V xex J 

(NL4) There exists a compact subset D C Y such that 



E In inf g{x,Yo) < oo. 
\ xeD J 

Under (NLl), every compact set C C X = M with positive Lebesgue mea- 
sure is 1-small and therefore local Doeblin with Ac(-) = A^'^'^(-nC)/A^'^'^(C), 
¥'c(2/o) = AL^b(C), and 

er = inf q(x,x'), 

e^ = sup q{x,x'). 

(x,x')£C'2 

Under (NLl) and (NL2) , the conditions (18) and (16) are satisfied with 
r = 1. In addition, (17) is implied by (NLl) and (NL4) . Consequently, 
Assumption (Al) holds. Moreover, (A2) follows directly from (NL3). So, fi- 
nally, under (NL1)-(NL4) we conclude, using Theorem 7 and Proposition 4, 
that the asymptotic variance of the bootstrap particle filter is tight for any 
initial distribution x such that x(D) > 0. 

5. Proofs. 
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5.1. Forgetting of the initial distribution. 

Lemma 9. Assume (Al—2). Then for all j > 2/3 there exist functions 
p^ : ]0, 1[ —7- ]0, 1[ and C-y : ]0, 1[ — )■ M+ such that for all n £ f^ and all 
Zq~^ G Y"*", where r G N* is as in (Al) and zi = yy, , satisfying 



n-l 



n 



-iJ^1k(^.)>7, 



i=0 



all functions f and h in J^+(X), all finite measures x ond x' ^'^ -^(X, A'), 
and all t] G ]0, 1[, 

(39) \A^y{z^-'){f,h)\ 

< p^ir^) {xMz^-')f X x'Mz^-')h + x'Mz^-')f x xMz^-')h) 

n-\ 

+ C,(ii)rf ll/IL ll/^lloo n l|L(^.)lx|lL X(X)X'(X), 

4 = 



(40) 



xHzr')f) K^Hzrif 



xHzr')h \ j^Hzr')^ 



-,n-l\ 



<(l-p^(r/))-i 



X 2p"(77) + 



C,(r?)r?" II/IL ll/^ILniU' ||L(zi)lx|Lx(X)x'(X)^ 



oo ll'"lloo lli=0 



xL(zo«-^)/ X x'L(zo"-^)/i 



(41) 



'T /-,"-l\ 



xHzl-')h x'L(zo"-^)/i 



XL(4-1)/ x'L(zo"-^)/ 



< P'^M 



It /-,"— 1\ 



xL(zo"-^)/i , x'Mzi;-')h 



+ 



n-l\ 



xL(zo"^^)/ x'L(zo"-^)/ 



+ 



C,(^)r/" ||/i|L ll/lloo nr=o l|L(^^)lx||^ X(X)X'(X) 



n-l\ 



xL(zo""^)/ X x'Mz^~')h 



Proof. The proof is straightforwardly adapted from [13, Proposition 5]. 

D 

Lemma 10. Assume (Al). Then there exists a constant k > such that 
for all X G A^(D,r) (where A4{D,r) is defined in (19)), 



(42) 

and 

(43) 



inf y.('^+™)xL(yt^)lx>0, 

(fc,m)GN*xN 



"-a.s., 



inf K('=+'")||L(y^-^)lx||oo>0, P-a.s. 
(fc,m)eN*xN 
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Proof. To derive (42) we first establish that 

(44) hminf {k + m)~^ flnxL(y^-^)lx) 



k+' 



> -rE { In" inf 6^L{Y^~^)1d ] > -oo, P-a.s., 



where the last inequality follows from (Al)(iii). We now establish the first 
inequality in (44). Set ak^m = —k + [{k + m)/r\r and note that —ak,m G 
{—771, . . . , —m + r — 1}. Then, write 

(45) 
lnxL(y_^-l)lx 

[(fe+m)/rj-l 

> lnxL(y:r"">lD + 5^ In inf 4L(y::;;;;^')^-')lD 

i=0 

r-1 L(fc+m)/rJ-l 

> -^ln-xL(y-r+^)lD - j; In" mf 5,X(yr;';-,f +^)^-^)lo. 

i=0 i=0 

For i S N, set [z]r. = i— \i/r\r. With this notation, ak^m = [0'k,m\r+ \o,k^m/f\ f- 
Then, since [i]r G {0, . . . , r — 1}, 

[(fc+m)/rj-l 

(46) - y: ^ ^^^^=::2T''"'")^o 

i=0 

[{k+m)/r]-l 

= - y In- inf 5MYi"'i'':^ti^"'-iT'^"~')iD 

r-1 [(fc+m)/rj-l 

>-y y In- inf <5,L(y-^?-/.*-L"^-/'-J+^)^- Vd 

j=0 i=0 

r-1 l{k+m)/r]-lak^m/'r\-l 

= -E E ln-mf4L(y-;r^^-^)lo, 

where the last identity follows by reindexing the summation. We now plug 
(46) into (45); the ergodicity of the process {Zn)nez (Assumption (Al)(i)) 



imsart-aap ver. 2011/11/15 file: dmo2011.tex date: March 1, 2013 



LONG-TERM STABILITY OF SEQUENTIAL MONTE CARLO METHODS 25 
then implies, via Lemma 13, P-a.s., 

liminf (A; + m)"^ flnxL(y^-^)lx) 
> J^E An- mf 5,L(y:/+"^i)lD') = -rE ("in" mf (5,L(yj'~')lD] , 

which shows (44). Now, choose a constant k such that 

-rE (in- inf 6^'L{Y^-^)1d] > -In/t > -oo. 
\ xeD J 

According to (44), there exists a P-a.s. finite N*-valued random variable N 
such that if k + m > N, 

lnxL(yj-i)lx>(-lnK)(A: + m), 

which implies that 

inf «^+-xL(yo'^-^)lx>l. 

k+m>N 

On the other hand. Assumption (A2) implies that for all {k, m) G N* x N, 
xi-i{YQ~ )lx > 0, P-a.s. This completes the proof of (42). Finally, the proof 
of (43) follows by combining 

l|L(yV>ixlloo>xL(y_t')ix 

and (42). D 

For all probability measures x 6 '^(X, A'), all {k^m) G N* x N, and all 
sequences y^^ G Y'"+'^+^, define the set 

(47) MiytJix) 

^ {x G V{X,X) : |b(-,yfc)IL X xMy-~2nx > (l/2)xL(y^Jlx} 

of probability measures on (X, X) and note that this set is nonempty since 
X G Ai{y^^){x)- The choice of 1/2 in the definition of A^(y^m)(x) is ir- 
relevant and this factor can be replaced by any constant strictly less than 
1. 

Proposition 11. Assume (Al-2) . Then there exists a constant /3 G 
]0, 1[ such that the following holds. 
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(i) For all probability measures x o-nd x' in 7W(D,r) there exists a P-a.s. 
finite random variable C^ y.' such that for all {k, m) G N* x N and all 
X^MiY^Jix), 



^^ / xL(y_Uix \ _^^f x'MY^Jlx 



< C\yl3>^+"', 



"-a.s. 



VxL(K_'-^)ixy \x'My!:~Mx^ 

(a) For all probability measures x ^'^ M{D,r) there exists a ¥-a.s. finite 
random variable C^ such that for all {k, m) G N* x N, 



^^ ( xMYtJlx \ _ j^ fxMY^^ 



-i)lx 



m-l)lx^ 



< C;,^'^+™, 



"-a.s. 



VxL(yti)ix; "'\xMY^ 

(Hi) There exists a ¥-a.s. finite random variable C such that for m G N*, 
all probability measures x o-i^d x' ^^ ^(X, A'), and all h G -7^(X), 



\UY-i)lx 



12 

loo 



<C/3" 



-a.s. 



Proof. Proof of (i) and (ii). Let x £ A^(K^m)(x)- Recall the notation 



Zi = Yy^ and consider the decompositions 



xL(y^^)ix = xMY:}r'''^'~')Mz^Xni~r>^^i 



xHY: 



k 

-r. 

k-l\ 



■m 



ix = xL(y-„^/''J^"^)L(zl^/'-J-^^ 



[fc/rjr 

rk-1 



ll> 



[fc/rJr)^X' 



4m/rJ )^\^[fc/ 

where we make use of the convention (7) if necessary. 

Choose 7 such that 2/3 < 7 < P(^o £ K), where K is defined in (Al) 
(i). Assume that {k,m) G N* x N are both larger than r and denote by 
i>k,m = [k/^l + [iT^/rl . In addition, define the event 

[ ^'- -I ^ £=-[m/rJ 

By Lemma 9 (Eq. (40)) it holds for all rj G ]0, 1[, on the event i^k,rm 



(48) (l-p^(r?)) In 



/ xMYl 



'm) Ix 



UL(y'-„^)ix 



In 



x'L{Y':Jix 

x'MY^^r'nx, 



^:)^Km, ,^C,{r^)v''^-U:Yk)\\^Y[t'^mM:Y^^ 
< 2p^ (?/) + — 



loo 



xL(y^-i)ix X x'MYl:Jix 



h^^.. , ^c^ivh^^-Ul 



<2p;'=-(7?) + 



|5(-,^*) 



xL(y_^^)ix X x'L(y^'jix ' 



where 
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(a) follows from (40) and the bound d^MY^)!^ < HLw ll5(->^^)lloo' valid 
for u < V, and 

(b) follows from the fact that x 6 A^(K^m)(x)- 

Since, under (Al)(i), the sequence (Z„)„gz is ergodic and P(^o £ K) > 7, 
Lemma 13 implies that 

/ A 

■ i>0(fc,m)eN*xN . 

\ k+m>j / 

Hence, there exists a P-a.s. finite integer-valued random variable U such that 
(48) is satisfied for all {k, m) G N* x N such that k + m>U. 

The lower bound obtained in Lemma 10 implies that there exists a con- 
stant K > such that for all probability measures x ^^'i x' ^^ M{D,r) and 
all {k,m) gN* x N, P-a.s., 

xMY':ji^>c^yK-(''+"^+'\ 

where C^^^' is a P-a.s. finite constant. 

By plugging these bounds into (48) and using Lemma 14 with rj suffi- 
ciently small (note that (48) is satisfied for all r] G ]0, 1[), we conclude that 
there exist a P-a.s. finite random variable C^,^' and a constant /? < 1 such 
that for all {k,m) G N* x N, P-a.s., 

^^fxL{Y^Jlx 




which completes the proof of (i). Note that x ^ ■M{Y^^){x) implies that 
the previous relation is satisfied with X = X- 

The proof of (ii) follows the same lines as the proof of (i) and is omitted 
for brevity. < 



Proof of (iii). As in the proof of (i), write 

r 77- 

- [m/rj 



xMYli)h = xMYI^"'^''i^-')L{Zz}.,)h 



and define the event 

-1 



I i=— [m/r] 
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By Lemma 9 (Eq. 41) it holds, on the event fi^, 



(49) 



xL(y-^)ix x'L(y-^)ix 



< 2 



oo r 7 



Im/rJ 



iv) + 



C^{r])v 



[m/rj 



. lli=-m \\9v^ ^ij 



\oo 



xL(y-\)ix X x'L(y-„Oix 



where we used that for u<v, 6 ccL {¥,"-; )lx < ULu !!£'(•' ^^)lloo- Under (Al) 
(i), Birkhoff's ergodic theorem ensures that P(Hminfm->.oo ^m) = 1; there- 
fore, there exists a P-a.s. finite random variable U such that (49) is satisfied 
for m >U. Then, for m > U, 



(50) 



A^,^,(y„-^)(/i,ix)| 

l|L(y-^)lxP 

_ xL(ic^)ix X VL(yr^)ix 
l|L(y-^)ixP 



X-L{Yl^)h x'MYli)h 



< 2 



xL(y_-^)ix x'L(y-^)ix 



OD r7 



|L(yr^,)ixP 



we have used that xL(y_f„)lx < ||L(y_,;„)lx||oo- By Lemma 10, Eq. (43), 
there exist a constant k > and a P-a.s. finite random variable C such that 



\MYI^)1x\\oo>Ck- 



-a.s. 



Finally, we complete the proof by inserting this bound into (50) and applying 
Lemma 14 to the right hand side of the resulting inequality. 



D 



5.2. Convergence of the log-likelihood. 
Lemma 12. Assume (Al-2). Then, F-a.s., 

(51) lim ?i-iln||L(l^")lx|L=4 

n— >cxD 

(52) lim ?i-iln||L(y^„)lx||oo = ^ 



(53) 



hm n~iVin^(yr^-i)(y_fc) 



fc=l 



where ioo is defined in (27). 



imsart-aap ver. 2011/11/15 file: dmo2011.tex date: March 1, 2013 



LONG-TERM STABILITY OF SEQUENTIAL MONTE CARLO METHODS 29 



Proof. Proof of (51). Let (a„,)„gN* be a non-decreasing sequence such 
that hm„_j.oo On = 1 and for any n G N*, a„ > 1/2. For aU n G N, choose 
x„ G X such that 

(54) an ||L(yo">lxlL < %.L W>lx < ||L(V>lx|loo • 

Note that for aU A; G N*, 



(55) 6i,_Myo nx > afc-i||L(yo'-')lx||oo > afc_i<5s,L(y(f-^)lx. 
On the other hand, for all probability measures x ^ ^(X, X) it holds that 



(56)..,L(yr>lx5|.Il^^|^JI^^^o^)^x| 



\9i;Yk)\\c 



\9{;Yk)\\c 



> Oik 



ll5(-,n)iioc 



where (a) follows from the bound (5£j.L(1q )lx < ||5f(-,yfe)||^ 5x"fcL(yQ 
and (b) stems from the definition (54) of a„- Then, 

< n-i (In ||L(yo")lxlL " lnxL(yo")lx) 
< -n-i lna„ + n'^ (In (a„ ||L(yo">lxlL) " lnxL(yo")lx) 
<-n-i In a„ + n-i (In 5s„L(yo")lx- In xL(yo")lx) 
= -n-i lna„ + n~^ (ln(5£oL(yo)lx - InxL(yo)lx) 



k-l\ 



Ix 



(57) 



+ n 



fc=i 



In 



AL(yo')ix 

-My, 



fc-i\ 



In 



I xMY^n 



X 



^^k- 



\xMy, 



fc-i\ 



For each term in the sum it holds, by (55), 



In 



hMy^n^ \ , /^lM)ix 



M-My^ 



k-U 



In 



< — lnafc_i + In 



ix/ \xuyo 

( s^My^ny. \ 



k~l\ 



1> 



V<5x,L(yo 
For ah A; G N*, (56) implies that 



k-l\ 



1> 



In 



( xMy^n 



\xMy, 



^k-l 



1 xL(yo'=)lx 



so that Sxf^ belongs to the set M{Y^~^){x) (defined in (47)). Proposition ll(i) 
then provides a constant /3 G ]0, 1[ and a P-a.s. finite random variable C-^^ 
such that 



(58) 



In 



rk~l\ 



In 



xL(yo")i> 

rk~l\ 



< Cxr- 



/,,L(yo«-^)ixy \xMyo^'nx, 
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Finally, the statement (51) follows by plugging the bound (58) into (57), 
letting n tend to infinity, and using (26). < 

Proof of (52). For all {p,n) E N^ such that p < n, define Wp^n = 
ln||L(yp"-^)lx||oo and Wp^n = In ||L(yjXf^)lx||oo- Note that these two se- 
quences are subadditive in the sense that for all (p, n) € N^ such that p < n, 

Wo,n < Wo,p + Wp^n, 
Wo,n < Wo,p + Wp,n- 

Finally, for aU x G D, m € N, and y™'"^ G Y™', it holds that 



(59) ||L(yo™'-i)lx||oo > 5Myr~'nx > JJ inf 5,L(yf/^)'-^)lD. 

Using the stationarity of the observation process (Yfc)fceZ) we get, via As- 
sumption (Al)(iii), for all m G N*, 

(60) 
{mr)-^EiWo,mr) = {mr)-^E (Wo,m.r) > {mry^E (in ||L(yr"')lx||oo) 



> r-^E (In inf SMyt^^^" ')1d ) > -oo. 

The sequences (E (VFo,n))neN* and (E(Wo,n))nGN* are subadditive; Fekete's 
lemma thus implies that the sequences (n~^E (M^o,r!,))nGN* and (n~"^E(Wo,n))r!,eN* 
have limits in [— oo,oo[ and that 

lim n~^E (VFo,n) = hm n~^E (Wo^n 



= inf n~'E(Won)= inf n~'E(Won 

However, by (60) there exists a subsequence that is bounded away from — oo, 
showing that 

inf n~'^E{Won) = hm n~'^E{Won) > -oo, 

nGN* ' n— i-oo ' 

inf n-^E (Won) = hm n"^E (Won) > -oo. 

nGN* V ' / n-i-cxD V ' / 

Now, by applying Kingman's subadditive ergodic theorem and using again 
that E(M^o,fc) = ]E(M^o,fc) under stationarity, we obtain 

lim n~^Wo n = inf n~^E (Wq n) = inf n'^E (Wq „) 

= lim n~^WQn = ^oo, P-a.s., 

n— j-oo ' 
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where the last hmit follows from (51). This completes the proof of statement 
(52). < 

Proof of (53). Since E(| ln7r(yrci)(^o)|) < oo and the process {Yk)k&z 
is stationary and ergodic, (53) follows from Birkhoff 's ergodic theorem. < 

a 

APPENDIX A: TECHNICAL LEMMAS 

Lemma 13. // {Un)nez is a stationary and ergodic sequence of random 
variables such that K {\Uq\) < co, then 

(61) lim (k + m)-^ I y uA =K{Uo) , P-a.s. 

\e=-m / 

Proof. Denote 



fc-i 



ni 



w G ri; lim ( 

fc+m— >oo 



+ m)-U Y, Ui{Lo)] =E(C/o) 



-m 



^fe-l 



n.^luGn; lim ^^^^^^i^^ = hm ^HM^ = E (^o) r 

By Birkhoff 's ergodic theorem, P (^^2) = 1. To obtain (61), it is thus sufficient 
to show that ^1(1^12 = 0- The proof is by contradiction. Assume fif nri2 7^ 0, 
so that there exists uj G il.lr1il.2- For such uj, the fact that ui ^ ili implies 
that there exist a positive number e(u;) > and integer-valued sequences 
{kn{uj))n£N and (7n„(a;))„gN such that kn{oj) -|-?tt,.„(c<j) > n and for all n > 0, 



(62) 



Consider the following decomposition: 



Ej£i:t) ^^(-) 



E([/o) 



> e{u;). 



(63) 



kn{u}) + m„(w) 



+ 



kniuj) 



TUlt^-' Ueico) 



kn{u}) + rUniuj) mn{u}) kn{uj) + mn{uj) kn{u}) 

First, assume that (fc„(w))„gN is bounded. Since kniuj) + mn{uj) > n, it 
follows that mn{u}) tends to infinity, implying that 



(64) 



lim 



1 



lim 



0, 



n^oo kn{u}) + nini^) n^co kn{uj) + 7TT.ri,(u;) 
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whereas '^/Iq Ue{uj) / kn{u)) remains bounded. However, since w G il2 
and lim„^oo mn{uj) = oo, 

lim )-^ = E [Uo) , 

which imphes, together with (64), that 

^™ 1 I \ ^ T^ = ^ ^0 • 

n-j-oo kn[uj) + mn[Uj) 

This contradicts (62). Using similar arguments one proves that (m„(w))neN 
is unbounded as weU. Hence, we have proved that neither (fcn(a;))„eN nor 
{mn{oj))n<m are bounded. 

Then, by extracting a subsequence if necessary, one may assume that 
hm,„_>.oo kn{oj) = hm„_^oo '^n('^) = oo. Since a; € ri2, this imphes that 

n-j-oo mn\U}) n-s-oo kn\Uj) 

Combining this with (63), we obtain that 

which again contradicts (62). Finahy, 0^ Pi r22 = 0; and since P ($^2) = 1, we 
finally obtain that P (ili) = 1. The proof is completed. D 

Lemma 14. Let {Uk)kez, (^fc)fceZ) CL'^'d {Wk)kez be stationary sequences 
such that 

E (ln+ Uo) <oo, E (ln+ Fo) < 00, E (ln+ Wq) < 00. 

Then for all rj and p in ]0, 1[ such that — In rj > E(ln'^ Vq) there exist a F-a.s. 
finite random variable C and a constant /3 G ]0, 1[ such that for all k £ W 
and m G N, ¥-a.s., 

( ^-'"^ \ 



+ r]''+"'W.m [ llVi]Uk< Cli' 



\i=-m / 

Proof. See [13, Lemma 6]. D 
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